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In this study we analyze the dynamics of the evolution of the contact lists of millions of users 
of the Skype communication network. We find that egocentric networks evolve heterogeneously in 
time as edge additions and deletions of individuals are grouped in long bursty clusters. We classify 
users by their contact addition dynamics and show that bursty peaks of contact additions are likely 
to appear shortly after user account creation. We also study possible relations between contact 
addition activity and other user-initiated actions. Evidence is put forward that bursty peaks in 
contact addition are associated with both free and paid service adoption events. 



I. INTRODUCTION 

The structure and evolution of human interactions 
are generally characterized by heterogeneity in manifold 
ways and are influenced by several correlations ranging 
from individual level to global scale [1H [7J [HJ [2B] . Some 
of these emerging heterogeneities have been identified 
as the result of simple processes driven by microscopic 
rules. For example, preferential attachment [26] has been 
shown to introduce degree heterogeneity and short path- 
ways into an evolving network structure, while affinity of 
local triangle closing [19] induces high clustering coeffi- 
cient and largely overlapping neighborhoods leading to 
strongly modular network structures. At the same time 
studies of action sequences - such as communication ac- 
tions in social networks - have put into evidence further 
mechanisms responsible for inhomogeneities in network 
structure and dynamics [TU [TS1 [H] . ft has been shown in 
particular that communication and topology are not in- 
dependent, enhancing the importance of weak ties linking 
communities |25] and that correlated dynamics of indi- 
viduals induce bursty temporal patterns of interactions 
[T31 HU [TBI [57] • However, for the most, these studies 
draw their conclusions from observation of static network 
snapshots or from incomplete temporal sequences of in- 
teractions. Only recently it has become possible to follow 
the dynamics of large-scale networks at the microscopic 
level as a number of datasets have been collected that 
contain time-stamped records of all topological actions, 
such as addition and deletion of edges between pairs of 
users [U [19] . This development has opened the possi- 
bility to study directly the governing microscopic rules of 
network evolution in order to confirm previous hypothe- 
sis and to explore new mechanisms. 

In particular, the availability of time-stamped records 
of user registrations, edge additions and edge deletions, 
allows to explore in details the microscopic evolution of 
egocentric networks, where an egocentric network consists 
of an individual user and their immediate friends or "con- 



marton.karsai@aalto.fi 



tacts" . The observed evolution of such microscopic net- 
works can be the aggregation of several processes. Cre- 
ating a link could be the result of finding an old friend 
who is already in the network, but could be also a sign of 
real emergence of a new relationship. Sudden dramatic 
changes in the egocentric network could be induced by 
changes in the ego's social status (e.g. moving to a new 
place or starting a school) but also by the adoption of 
new services which open possibilities for alternative ways 
of interactions. 

In this study we characterize the temporal evolution of 
egocentric networks in a very large online social network. 
We present for the first time empirical results on a dataset 
that contains anonymized data of hundreds of millions of 
subscribers of Skype - one of the largest world-wide on- 
line communication system available. We mainly focus 
on the temporal evolution of trusted social links of indi- 
viduals and we detect correlated bursty periods in their 
edge addition and deletion dynamics. We also highlight 
some possible reasons behind bursty behaviour by look- 
ing for correlations between the observed dynamics and 
other user-initiated actions. To study these phenomena is 
not only important because we gain deeper understand- 
ing into technology-enabled human behaviour but also 
because it provides us insights to design better services 
and to optimize computational resources. 

The rest of the paper is structured as follows. In 
the Section [il] we give a brief description of the utilized 
datasets which after in Sections |III| and |TV] we present our 
main results about the temporal evolution and correla- 
tions of egocentric networks. In Section fVT 



we overview 



the related works reported in the literature and finally in 
Section [VII| we summarize our main findings. 



II. DATA 

This research is based on a dataset consisting of a 
temporally detailed description of the social network of 
(anonymized) Skype users. For each user, the dataset 
provides the following details: 

• Date of registration of the user 
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• For each type of paid service (e.g. PSTN calls), 
date when the user first and last used this service 
(whenever applicable). 

• Time series indicating the number of days in each 
month when the user connected to the Skype net- 
work 

• For each type of free service (e.g. Skype-to-Skype 
audio calls, video calls, chat, etc.), time series indi- 
cating the number of days in each month when the 
user used this service. 



elapsed between consecutive additions at tf and tf +1 or 
deletions at tf and tf +1 of the same user. If this distri- 
bution is broad and follows a power-law as 

P(t) ~ r-T (2) 

it indicates strong temporal heterogeneities and bursti- 
ness, or otherwise if it decays exponentially it reflects 
regular dynamical features. Bursty temporal evolution of 
human dynamics was confirmed in various systems rang- 
ing from library loans to human communication [7J Q3] 
or recently for the evolution of social networks [5] . 



Additionally, the dataset comprises user connections. 
In the Skype network, when a user adds a friend to 
his/her contact list, the friend may confirm the contact 
invitation or not. Also, at any point in time a user may 
delete a "friend" from their contact list. [7 ] Thus, the 
network evolves by means of the following events: contact 
addition, contact confirmation and contact deletion. 

In order to take into account only trusted social links 
we retained only confirmed edges, meaning edges where 
both parties accepted the connection. Failure to do so 
would lead to mixing undesired with desired connections. 

For the present study we employed two subsets of the 
above dataset. The first dataset (DS1) includes every 
active user as of the end of 2010, all confirmed edges 
between these users, and the date of confirmation of each 
edge. In this context, we define an active user as one who 
connected to the Skype network at least in two different 
months during the first year after their registration date. 
In order to consider users with realistic number of friends 
we selected only those with degree between 2 and 1000. 
This filtering led to a set of more than 150 million users. 

The second dataset (DS2) included the complete set of 
edge addition, edge confirmation and edge deletion events 
recorded for a year-long period in 2010 — 2011. Only 
events related to users with degree between 2 and 1000 
were kept. Unlike DS1, non-active users were retained in 
DS2. 
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FIG. 1. Inter-event time distributions of edge addition (blue 
squares) and deletion (red circles) events of users in DS2. The 
straight line indicates a power-law function with exponent 
7 = 0.85. 

Calculating P(t) for DS2 we observe broad inter-event 
time distributions both in case of edge additions and edge 
deletions. The distributions are showing rather similar 
scaling with a broad section fitting on a power-law with 
exponent 7 ~ 0.85 and an exponential cutoff due to the 
finite time window. This is an interesting observation as 
one would expect rather different decision mechanisms 
behind adding and deleting a contact. However, the sim- 
ilarity of the two distributions indicates common tempo- 
ral features and bursty dynamics both in case of edge 
addition and deletion. 



III. EGOCENTRIC NETWORK EVOLUTION 

In this section we look at the evolution of egocentric 
networks to gain deeper understanding about the govern- 
ing microscopic rules of contact list evolution. 



A. Bursty edge dynamics 

To characterize the temporal evolution of contact lists, 
we first look at the edge addition and deletion sequences 
of individuals and calculate the distributions of inter- 
event times 

r a = t a i+l -t a i and Td = tf +1 -tf (1) 



B. Trains of bursts 

The broad inter-event time distribution is indicative of 
the presence of heterogeneities, however it cannot show 
whether further correlations are present between consec- 
utive actions. Recently a new methodology was devel- 
oped p3] to address this question and evince evident 
correlations in heterogeneous binary signals. It counts 
the number of consecutive events which follow each other 
with inter-event times smaller or equal to At. The sen- 
sitive measure of correlations serves as the E number of 
events in bursty clusters as its distribution scales as a 
power-law 

P(E) ~ E-' 3 (3) 
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if the signal is correlated and long event trains are evolv- 
ing in the dynamics. On the other hand if consecutive 
events are independent it decays exponentially even the 
intcr-event time distribution is fat-tailed. 
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FIG. 2. Distribution of number of events in bursty trains 
of (a) contact addition and (b) deletion of individuals in 
DS2. Distributions were calculated with time window sizes 
At = 1,2,4,8,16 and 32 days. Distributions calculated for 
randomly shuffled sequences are also presented (dashed line) 
calculated with the same At values. Straight lines are indica- 
tion of power-law functions with exponents (a) /3 = 2.0 and 
(b) p = 1.8. 

In the present case we analyzed the edge modification 
sequence of each individual, we detected the clusters of 
events of new edge addition and deletion and record their 
size E a (E d ). The broad P(E a ) and P(E d ) distribu- 
tions in Fig (2] confirm the presence of correlations evolv- 
ing between consecutive events of edge additions (dele- 
tions) as the corresponding train sizes are distributed as 
a power-law with characteristic exponent values /3 a ~ 2.0 
{fid — 1-8). This scaling behaviour appears to be robust 
against the selection of the At window size as it remains 
similar for distributions calculated with At = 1, 2, 4, 8, 16 
and 32 days. 

The presence of correlations in the egocentric dynam- 
ics are even more apparent if we compare the empirical 
P(E) functions to the equivalent distributions calculated 
for independent signals. To receive a reference system 
like this we used the r a (r^) inter-event times of the orig- 
inal sequences, put them in a pool and redraw for each 
user randomly as many inter-event times as they had 
originally. The inter-event time distribution of the re- 
sulted randomly shuffled null model was similar to the 
original P(t) distribution as we used the same r a (tA 
values and also the shuffled sequences of each user con- 
tained the same number of events as before. However, 
with this random shuffling method we destroyed all pos- 
sible temporal correlations which were present between 
the consecutive events of single users. The P(E) distri- 
bution of such random shuffled sequences should decay 
exponentially |13j and any discrepancy from this scaling 
behaviour is indicative of correlations. 

It is demonstrated in Fig|2]a and b that the P(E) dis- 
tributions calculated for the randomly shuffled reference 
sequences (dashed lines) are exponentially distributed 
and they are very different from the original distribu- 



tions (solid lines). It puts into evidence that the actions 
of an individual are not independent and we can conclude 
that the evolution of egocentric networks is not only het- 
erogeneous in time but intrinsic correlations are driving 
its dynamics. They lead to the evolution of high activity 
bursty periods, where a large number of edges are added 
or deleted, and which are followed by long low activity 
intervals. 



IV. GROUPS OF INDIVIDUAL DYNAMICS 

So far we have observed that edge addition and dele- 
tion events of an individual are bursty and clustered in 
time, yet we know less about when these bursty trains 
are evolving during the lifetime of a user. Do they ap- 
pear in any time or there are typical activity patterns of 
edge additions or maybe triggered by other user actions? 
In the following we address these questions by seeking 
correlations of bursty peaks with other user activities. 

In order to compare the edge addition sequences of in- 
dividuals we used DS1 and concentrated only on the ac- 
tivity of users during the first year of their t u user time 
i.e. the time after their registration. We were keeping 
track the number of newly added edges of each node 
i with a single month resolution and receive a discrete 
ii{t u ) sequence for each individual where t u = 1...12. 
To be able to compare sequences of users with diverse 
overall intensity we applied the Symbolic Aggregate Ap- 
proximation |20j method with alphabet size 10. Other 
methods e.g. combining standardization with Discrete 
Wavelet Transform [3] were tested and served very sim- 
ilar results (not shown here). To detect groups of users 
with similar edge addition dynamics we applied the k- 
means clustering [9] method on the activity sequences 
using euclidean distance. It has been shown that run- 
ning k-means on data that was previously processed by 
SAX produces better results than the original data [21] . 
To choose the number of clusters we applied the cluster- 
ing for different values up to 200 clusters and determined 
that the sum of squared errors levels off at some point 
where k > 40 (the approach is also referred as Elbow 
method). After inspecting different clusterings manually, 
we determined k = 44 to be around optimal. K-means 
algorithm was chosen for its ability to scale to O(100M) 
data points. The aim here was to use clustering to ex- 
ploratory analyze the data and not to find single cluster- 
ing and therefore we decided not to imply formal model 
selections methods. 

In Figj3]we show the (ai(t u ))k average activity curves 
of each k cluster together with the percentage of users 
who are belonging to the actual group. Looking at the 
most common patterns it is straightforward that typically 
people perform their principal (the largest and usually 
the only one) edge addition burst right after they join the 
network. This is the time when they explore their social 
acquaintances who have already joined Skype before and 
which after they add contacts just occasionally with lower 
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FIG. 3. Characteristic groups of contact addition patterns calculated for DS1. On each panel red solid lines correspond to 
the average number of new contacts added at the actual month, dashed blue lines are the average number of connected days, 
while green pointed lines are the average number of days used Skype-to-Skype communication belonging to the actual group. 
Clustering is obtained by clustering only contact addition (red line) . Panels are in descending orders regarding the partition of 
users they cover (percentage above each panel). Left scales corresponds to the number of added contacts while right scales are 
number of days. The tick with label y\ (t/2) on the left (right) vertical scale is corresponding to the same value on each panel. 



frequency. This behaviour is confirmed by looking at the 
(a>i(tu)) average number of new edge additions (Figj4]a) 
calculated for each user. Note that a similar behaviour 
was observed in other studies [5J. In addition, Figj4]b 
shows that this correlation is independent from the time 
of registration as similar early stage peaks were found for 
users who joined Skype at different years. 

To check the significance of this phenomenon we com- 
pare the (ai(t u )) overall average curve to a similar curve 
{o-i{t u )) r calculated for independent sequences. To gen- 
erate the null model sequences we apply a very similar 
method as earlier. We take the activity values of each 
user at each month, randomly shuffle them and redis- 
tribute between users. This way the overall activity re- 
mains unchanged and each user have a sequence of 12 
data points as before, but the correlations between ac- 
tivity peaks and user time are destroyed. The result- 
ing average activity curve becomes fiat evidently as it is 



demonstrated in Fig|4ja (blue line). Comparing the orig- 
inal and random curves (red and blue lines in Figj4]a) 
it is straightforward that the peak at early times, which 
is visible for the original curve, is not observable for the 
null model curve supporting the significance of the cor- 
relation. 



V. CORRELATIONS AT LATER TIMES 

A single bursty peak at early time is not the char- 
acteristic of every user. Less common motifs in Figj3] 
show that principal bursts may emerge later or even in 
multiple times. This observation indicates that events 
other than user registration may also trigger immediate 
changes in the egocentric network. Changes in social sta- 
tus as moving to another place or starting school could be 
possible reasons behind the later bursty peaks, but given 
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activ 


S2S rand 


activ rand 


(r) 


0.34608 


0.308137 


8.31909e-6 


1.69978e-5 



TABLE I. Average correlation coefficient calculated between 
edge addition dynamics-free service usage (S2S) and con- 
nected days (activ). Values received for random sequences 
are also presented. 



the actual dataset, we are not in a position to confirm 
these effects. However, it is possible to study dependen- 
cies between the dynamics of contact addition and other 
system-related activities. 
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FIG. 4. Average number of new contacts added as function 
of t u user time calculated for (a) all users in DS1 and (b) for 
users grouped by the time they joined Skype. Age in the (b) 
figure caption refers to the time of registration in advance of 
the measurement. 

In Figj3] besides the average edge addition rates we 
also show the average number of connected days and av- 
erage number of days of free-service usage for each group. 
Comparing these curves one can foresee some dependency 
between them as users are performing bursty contact ad- 
dition months at the time when they are connected and 
also heavily using free services. To quantify these rela- 
tionships we calculated a correlation coefficients for each 
user i defined as 



((ai(t) - ai)(si(t) - Sj)) t 



(4) 



where Si(t) denotes the sequence of number of connec- 
tion days or free service usage days and the average is 
running through 12 discrete time steps. The CDF(r) 
cumulative distribution of the two coefficients calculated 
for every users is depicted in Fig[5] (dark red curve for 
correlations with Skype-to-Skype free services (S2S) and 
dark blue curve with connected days (activ)). They in- 
dicate mostly positive correlations as almost no user was 
found with coefficient r < —0.5 and at the same time 
approximately 80% of users present non-negative corre- 
lations in both cases. The (r) average correlation coeffi- 
cients of the two distributions assign also strong positive 
correlations as they take values (rs2s) = 0.34608 and 
{factiv) — 0.308137 for the free service and user activity 
accordingly (vertical dashed lines in Fig{5]). 

To check whether the observed positive correlations 
are significant or only the results of random fluctuations 



FIG. 5. Cumulative distribution of the r Pearson correlation 
coefficients calculated between the contact addition dynamics 
of individuals, their service usage intensity and user activity 
during their first t u — 12 months (solid dark red and blue lines 
accordingly). Similar curves calculated for random sequences 
are also shown (solid light red and blue accordingly) . Dashed 
vertical lines assigns the average correlation coefficient M of 
the present distributions (for numerical values see Tableffl). 



of independent processes we calculated the correlations 
between the same curves but after we randomly shuf- 
fled the activity sequences. To generate the null model 
sequences we applied the same method described above 
in Section |IV| for each sequences. The cumulative dis- 
tribution functions of the correlation coefficients calcu- 
lated between the random sequences are shown in Figj5] 
(light red line for free service usage and light blue line for 
user activity). These curves are very different from the 
ones of original sequences and also their average values 
are presenting significant discrepancies. For random se- 
quences (rg%g d ) ~ (racui) — in agreement with the ex- 
pected values of correlations calculated between indepen- 
dent signals (the average values for all calculated CDF(r) 
are summarized in Table [I]). Consequently the observed 
dependencies between the original sequences are signifi- 
cant and they indicate true positive correlations between 
the edge addition dynamics of individuals, their user ac- 
tivity and free service usage. 
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FIG. 6. Matrices of conditional probabilities that a user per- 
formed a peak contact addition month at T a if he adopted 
a (a) free or a (b) paid service at T s . Probability values at 
T a — and T s = are not shown. Colors are coding logarith- 
mically the probability values. 
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Another possible reason behind sudden changes in the 
egocentric graph can be due to the adoption of a new 
communication service which opens an alternative way of 
interaction. This channel could be a free communication 
service what the user explored or could be a paid service 
what he/she subscribed for. Therefore, one can look for 
the time when a user starts to use a free or paid service 
for the first time and check whether these actions can 
trigger bursty peaks in contact addition. To do so we 
identify bursty peak months for each user as the months 
where the contact addition activity is 

o?(i) = (oi(*)|oi(t) > at + 2a ai ). (5) 

Here a,i and a ai denotes the average and standard devi- 
ation of the contact addition activity sequence of user i. 
In Fig|6ja and b we present the conditional probabilities 
that a user performs a peak month of contact addition at 
a given user time T a if he/she adopted a free (Fig[6ja) or 
a paid service (Figj6]b) at time T s . We have seen earlier 
that strong correlations are playing role between the user 
activity and registration time at t u = 0. However, here 
we are looking for correlations which evolve later in user 
time. To avoid the dominating effect of strongly corre- 
lated early activity peaks in Fig|BJ we neglect the data 
bins belonging to adoption time T s = and bursty peak 
time T a = and normalize the probabilities accordingly. 
After this preparation, correlations between late service 
adoption and peak contact addition months became vis- 
ible as a high activity diagonal appeared in the proba- 
bility matrices in FigjBJa and b. Consequently together 
with correlations with registration and user activity this 
result serves us another possible explanation for the late 
evolution of bursty contact addition peaks as they are 
possibly triggered by paid or free service adoption. 

VI. RELATED WORK 

Temporal evolution of networks was studied thor- 
oughly during the last years as datasets recording the 
dynamics of millions of interacting entities became avail- 
able jTUj. One of the most investigated area was the 
evolution of large social networks [3 [T7[ [TH1 HH] where it 
has been shown that several mechanisms push such net- 
works towards developing heterogeneous topologies and 
strongly modular structures [TJ [TH1 HBJ . In addition vari- 
ous methodologies have been developed to detect evolv- 
ing mezoscopic patterns [IBJ 123] and emerging commu- 
nity structures [BJ. Our study falls under the same um- 
brella as these previous works but focuses on the tempo- 
ral evolution of egocentric networks. 

Heterogeneities in the dynamics of social interactions 
have been observed by following the communication se- 
quences of individuals [H EJ [221 HE] ■ Circadian fluctu- 
ations and long range temporal correlations were shown 
to play important role here [JTJ [T31 HH [27] and they par- 
tially explain the observed non-homogeneous behaviour. 
Lately heterogeneous evolution of social networks was 



also reported by Gaito et al [BJ who analyzed the dynam- 
ics of the Renren online social network. In this paper 
Gaito et al [BJ simultaneously arrived to similar conclu- 
sions as us as regarding the burstiness in the evolution 
of contact addition of users. In our study - beyond con- 
firming this effect in an independent dataset - we extend 
this finding in two ways. First, we put forward evolving 
bursty trains also in the sequence of contact deletion of 
individuals and second we highlight non-trivial correla- 
tions triggering bursty periods in the evolution of ego- 
centric networks. 



VII. CONCLUSIONS 

Our motivation in this study was to understand the 
temporal evolution of egocentric networks in an online 
environment as looking for the contact modification dy- 
namics of individuals. We investigated for the first time 
one of the largest online social network currently avail- 
able, the network of Skype users. Our main observa- 
tion was that the dynamics of edge addition and dele- 
tion show strongly heterogeneous temporal behaviour as 
most of the edges are added or deleted during very short 
bursty periods which are separated by long low activity 
intervals. During such high activity periods long bursty 
trains of contact addition events can evolve confirming 
the presence of intrinsic correlations. We also concluded 
that such trains show strong relation with the registra- 
tion time as they are most likely to appear right after 
the user joined the network. High activity bursty peaks 
which evolved later and even in multiple times were also 
detected for some users. We showed that such patterns 
arc correlated with user activity and free service usage 
and could be triggered by free and paid service adoption. 
The observed temporal behaviour and non-trivial corre- 
lations disclose characteristics about the evolution of so- 
cial networks which suit well into the general picture of 
human dynamics as correlations and heterogeneity were 
confirmed earlier in many independent cases. However, 
beyond sophisticating our present assumptions about hu- 
man behaviour, these results serve a more pragmatic ad- 
vantage as they may help to improve the design of online 
services and to make more efficient allocation of shared 
computational and communication network resources. 
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